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We introduce a bipartite, diluted and frustrated, network as a sparse restricted Boltzmann ma- 
chine and we show its thermodynamical equivalence to an associative working memory able to 
retrieve several patterns in parallel without falling into spurious states typical of classical neural 
networks. We focus on systems processing in parallel a finite (up to logarithmic growth in the vol- 
ume) amount of patterns, mirroring the low-level storage of standard Amit-Gutfreund-Sompolinsky 
theory. Results obtained through statistical mechanics, signal-to-noise technique and Monte Carlo 
simulations are overall in perfect agreement and carry interesting biological insights. Indeed, these 
associative networks pave new perspectives in the understanding of multitasking features expressed 
by complex systems, e.g. neural and immune networks. 
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Neural networks rapidly became the "harmonic os- 
cillators" of parallel processing: Neurons, thought of 
as "binary nodes" (spins) of a network, behave collec- 
tively to retrieve information, the latter being spread over 
the synapses, thought of as the interconnections among 
nodes. However, common intuition of parallel processing 
is not only the underlying parallel work performed by 
neurons to retrieve, say, an image on a book, but rather, 
for instance, to retrieve the image and, while keeping 
the book securely in hand, noticing beyond its edges the 
room where we are reading, still maintaining available 
resources for further retrieves as a safety mechanism. 

Standard Hopfield networks are not able to accom- 
plish this kind of parallel processing [13]. Indeed, spuri- 
ous states, conveying corrupted information, cannot be 
looked at as the contemporary retrieval of several pat- 
terns, but they are rather an unwanted outcome, yielding 
to a glassy blackout 0] . Such a limit of Hopfield networks 
can be understood by focusing on the deep connection (in 
both direct and inverse [l5| approach) with restricted 
Boltzmann machines (RBMs) In fact, given a ma- 
chine with its set of visible (neurons) and hidden (train- 
ing data) units, one gets, under marginalization over the 
latter, that the thermodynamic evolution of the visible 
layer is equivalent to that of an Hopfield network. It fol- 
lows that an underlying fully-connected bipartite RBM 
necessarily leads to bit strings of length equal to the sys- 
tem size and whose retrieval requires an orchestrated ar- 
rangement of the whole set of spins. This implies that 
no resources are left for further tasks, which is, from a 
biological point of view, too strong a simplification. 

Goal of this paper is to relax this constraint so to ex- 
tend standard neural networks toward multitasking ca- 
pabilities, whose interest goes far beyond the artificial 
intelligence framework [Ula, particular, starting 

from a RBM, we perform dilution on its links in such 
a way that nodes in the external layer are connected to 
only a fraction of nodes in the inner layer (fig.l, left). 
As we show, this leads to an associative network which, 




FIG. 1: (Color online) Example of diluted RBM, whose layers 
are made up of = 5 and P = 3 elements, (left) and its corre- 
sponding weakened associative network (right). In the former 
brighter (darker) links have positive (negative) coupling; in 
the latter, the patterns turn out to be = [—1, —1,0,0,0], 
= [-hi, -hi, 0,-1,0], = [0,-l,-hl,-hl,+l] and the 
weight associated to each link (i,j) is ^^(.ii'^- 



for non-extreme dilutions, is still embedded in a fully- 
connected topology, but the bit-strings encoding for in- 
formation are sparse (i.e. their entries are —1 as well 
as 0), (fig.l, right); for relatively low and large degrees 
of dilution, this ultimately makes the network able to 
parallel retrieve without falling into spurious states. 

More precisely, let us denote the P binary spins making 
up the external layer as = ±1, € [1, P] and the N 
binary spins making up the internal layer as (7^ = ±1, i G 
[1, A^]. RBMs admit the Hamiltonian description 

N,P 

H(a,T;C) = -^5]^ra,r^, (1) 

where we called the (quenched) interaction strength 
between the j*'' spin of the inner layer and the /i*'' spin of 
the external layer (possibly to be extracted from a proper 
probability distribution /'(Cf ), meant as the outcome of a 
learning process). Usually, one defines a — limAr^oo P/N 
as the storage value; in this work we deal with the "low 
storage" regime, i.e. P ~ log A^, corresponding to a = 0. 
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The thermodynamics of the system can be obtained 
by exphcit calculation of the (quenched) free energy /(/3) 
via the partition function Zn^p{(3) which read off, 

respectively, as 



lim _ElogZ^,p(/3), (2) 

Af— >oo pjv 
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E being the average over the quenched variables f . A 
key point here is that the interaction is one-body in each 
layer, such that marginalizing over one spin variable is 
straightforward and gives (expanding up to second order 
the hyperbolic cosine) 
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FIG. 2: (Color on line) Schematic representation of the dif- 
ferent regimes exhibited by the systems at /3 — > oo; here we 
= (4) fixed P = 7, for which dc^ « 0.51, dcj « 0.89. Solid (dashed) 
line frames denote global (local) minima. The states depicted 
correspond to eqs. 7 (parallel state) and 11 (hybrid state). 



where we introduced the P Mattis magnetizations — 

^"'Ef effT,. When P(ef = +1) - P(C = -1) - 
1/2, the Hamiltonian implicitly defined in eq. [5] recovers 
exactly the Hopfield model (at a rescaled noise level /3^) 
and the ansatz of pure state, i.e. m = (1, 0, 0) (under 
permutational invariance) , correctly yields the proper 
minimization of the free-energy in the low-noise limit. 
This means that, once equilibrium is reached, the system 
configuration a is aligned (under gauge invariance) with 
pattern relaxation that is understood as recovery of 
a pattern of information. 

As anticipated, here we remove the hypothesis of full- 
connection for the bipartite network, diluting randomly 
its links in such a way that the coupling distribution gets 
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+ d5^^'fi, (6) 



where d £ [0, 1] is a proper dilution parameter and 6i^j 
is the Kronecker delta. It is easy to see that with this 
distribution, after marginalizing over one layer as usual, 
we get an associative network, where the P patterns 
(/i = 1, P) contain zeros, on average for a fraction d of 
their length (a sparse coding can also be found in Will- 
shaw's model [10]). As a result, the pure state ansatz can 
no longer work. In fact, now, the retrieval of a pattern 
does not employ all spins and those corresponding to null 
entries can be used to recall other patterns. 

In particular, as we will show (see also fig. 2), at rela- 
tively low degree of dilution (d < dci), one pattern, say 
/X = 1, is perfectly retrieved, while a fraction d of spins is 
still available and its overlap with any remaining pattern 
is, on average, 1 — d; hence, the second best-retrieved 
pattern, say /i = 2, displays a (thermodynamical and 
quenched) average of the Mattis magnetization equal to 
d{l — d). Proceeding analogously, one finds 
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The overall number of retrieved patterns K therefore cor- 
responds to J2k=oi^~'^)'^'^ — 1' '^i*'^ cut-off at finite 
as (1 — d)d^~^ > N~^, due to discreteness. For any 
fixed and finite d, this implies K < logN, which can be 
thought of as a "parallel low-storage" regime of neural 
networks. 

On the other hand, at larger degrees of dilution {d > dci ) 
and P > 2, this state is no longer stable since no mag- 
netization is large enough to yield a field able to 
align all the related (^[' 7^ 0) spins; as a result, the system 
falls into a spurious state where all patterns are partially 
retrieved, but none exactly. Finally, when dilution is ex- 
treme {1 — d^ P~^), the retrieval of (nearly) all patterns 
can still be accomplished. Whenever the global minimum 
of the system corresponds to the perfect retrieval of at 
least one pattern, we refer to "multitasking capabilities" 
or, analogously, to "parallel retrieval" . 

Before proceeding with the thermodynamic analysis, 
we stress that the dilution introduced here is deeply dif- 
ferent from the one introduced early by Sompolinsky [l3| 
or more recently by Coolen et al. [lg|, who worked out 
the Hopfield model embedded in random networks, rang- 
ing from Erdos-Renyi graphs to small-worlds. In those 
systems, obtained by diluting directly the Hopfield net- 
work, the exciting result was the robustness of the (sin- 
gle) retrieval under dilution. 

Such different ways of performing dilution - either on 
links of the associative network (see ^i|) or on pattern 
entries (see Eq. [6]) - yield dramatically different thermo- 
dynamic behaviors. To see this let us consider the field 
insisting on each spin, namely for the generic z*'' spin 
ft = N^^Y.^^-^^Yf^^i^i^'j^ii and analyze its distri- 
bution P{tf\d) at zero noise level. When dilution is real- 
ized on links in the direct a — a network (here d is the 
fraction of links cut) , only an average fraction d of the N 
available spins participates to in such a way that both 
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FIG. 3: Distributions of fields for dilution a la Sompolinsky 
(left) and for our dilution performed on the bipartite network 
(right), shown for various degrees of dilution, as explained by 
the legend; for both systems we fixed /3 — >■ oo, TV = 5000 
and a = 0.05. In the former case, d represents the average 
fraction of cut links, in the latter case d represents the av- 
erage fraction of null pattern entries. As d is tuned, on the 
left, P{(p\d) behaves monotonically corresponding to an Hop- 
field model embedded on a random graph sparser and sparser, 
while, on the right, P[^p\d) does not behave monotonically and 
the model is still defined on a fully connected topology. 



the peak and the span of the distribution decrease with 
d (fig. 2, left). Conversely, when dilution is realized on 
links in the underlying bi-layer a — t network (here d is 
the fraction of null entries in a pattern), as d > 0, P{(p\d) 
gets broader and peaked at smaller values of fields. In- 
deed, at /3, N and P fixed, when dilution is introduced 
in bit-strings, couplings are made uniformly weaker (this 
effect is analogous to a rise in the fast noise), so that 
the distribution of spin configurations, and consequently 
also P{ip\d), gets broader. For small values of d this effect 
dominates, while at larger values the overall reduction of 
coupling range prevails and fields get not only smaller 
but also more peaked (fig. 2, right). A topological dilu- 
tion in the resulting a — a network can be realized also 
in this case, by taking d sufficiently close to 1 [l]. 

These different scenarios produce different physics, in 
particular, the latter field distribution can allow parallel 
retrieval of patterns. The robustness of these multiple 
basins of attractions can be checked by signal-to-noise 
analysis [l| and by solving the statistical mechanics of 
the model as sketched in the following. We underline 
that, as no slow noise due to an extensive amount of 
patterns is at work (a = 0), replica trick or techniques 
designed for disordered systems [1, [l^ are not neces- 
sary. We introduce a generic vector for Mattis magne- 
tizations as m = (mi, mp), a density of the states 
I?(m) = 2~''^^^(5(m — m(o')) and we write the free- 
energy density as 

/(/?) = ^ + ^ log 1 dm2?(m) exp . (8) 

After introducing the P-component vector x to allow in- 
tegral representation of the P delta functions encoded 
in the density of the states, and after some algebra, this 
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FIG. 4: (Color online) Two patterns analysis: Analytical so- 
lution at /3 = 10* (left) and at /3 = 6.66 (right). AU these 
curves have been checked versus Monte Carlo simulations (not 
shown) with TV up to 10"" spins with overall perfect agreement. 



equation becomes 



/(m,x) = - 




dmdxexp ( - iV/3/(m,x)), (9) 



/(m,x) = --m^ - ix • m- -(log2cos[^^ • x])^,IO) 

whose minimization w.r.t. m,x gives standard saddle- 
point equation m = (^tanh(/3f • m))^, whose numer- 
ical solution for the case P = 2 is shown in fig. 3. 
When /3 ^ oo, stable retrieved states of amplitude 
nil = I — d and m2 = d{l — d) are found, in agree- 
ment with eq. [T] On the other hand, in the presence of 
(fast) noise, the dependence on d of the network perfor- 
mance gets more complex. In fact, for small d, only the 
first pattern can be retrieved (whenever the fast noise is 
greater than the signal on 7712) and the parallel ansatz 
m = {d,d{l — d),d{l — rf)^,...) recovers the standard 
pure one (which can be seen as a particular case of the 
former). This Hopfield-like behavior persists as long as 
d{l — d) < /3~^, above which m2 also starts to grow 
and approaches the related zero-noise curve. At inter- 
mediate degrees dilution, the two magnetizations mi , m2 
collapse and their amplitude decreases monotonically to- 
wards zero. When c? ~ 1, the signal on both magneti- 
zations is smaller than fast noise so that retrieval is no 
longer possible and the system behaves paramagnetically. 
We now explain in more detail these features: We focus 
on the critical points corresponding to vanishing of mag- 
netizations and to bifurcations, again for the simplest 
case P = 2. The self consistency equations are 

mi = d{l - d) tanh(^mi) + ^^—^^[tanh{(3y) + tanh(;3a;)], 

m2 = d{l - d) tanh(/?m2) + [tanh(/3y) - tanh(;3x)], 

where y — mi -I- m2 and x — mi — m2. The critical noise 
level at which the magnetizations disappear can be ob- 
tained by expanding the self-consistent equation for m2, 
namely m2 ~ (1 — d)/3m2 + ©(m^). Therefore, from a 
standard fluctuation analysis, the critical noise level for 
the two patterns turns out as /3c = (1 — which re- 

covers /3c = 1 for the standard Hopfield model away from 
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FIG. 5: (Color online) Retrieval of three (left) and of six 
(right) patterns at /3 = 10^; for the latter we zoomed on the 
region of high dilution where bifurcations occur. The number 
of such bifurcations is, in general, upper bounded by P — 1. 



saturation Critical values of the noise level corre- 
sponding to bifurcations can be obtained by expanding 
for small x and such calculations can be extended to the 
case P > 2 (see fig. 5); an extensive treatment of the 
network performances can be found elsewhere 

In general, as mentioned above, the case P > 2 can 
be much more subtle as, even in the noiseless case, it 
exhibits several phases (see fig. 2): the parallel ansatz 
(eq. [71) ceases to be stable when mi < J2k>i which 
corresponds to a critical dilution approaching (expo- 
nentially from above) 1/2 in the limit of large P. Within 
the same region, a "hybrid state" s, which is a hierachical 
mixture of all patterns, is also found to be metastable. 
More precisely, being S = , 

S» - (1 - fe,o)sign(S) + SsA^l + J^i.oC- + S^i^o^e.o^f + ■ 

(11) 

This state gets the global minimum whenever X]i(l ~ 

fe,o)sign(S)Cf /A^ > Ei=7'^^' MP + l)/(-P - k), where 
= 2EJ(1 - d)/2]2'd^-2'(P - fc)!/[/!(/ - 1)!(P - k - 
21 + 1)1] and P is odd. This condition corresponds to 



d > dc2, where dca converges to 1 as P gets larger. 

As a final remark, we underline that, although the 
steady state of the current model and an arbitrary spuri- 
ous state both display non-zero overlap with several pat- 
terns, they are still deeply different. In particular, here 
the retrieval of multiple patterns corresponds to absolute 
energy minima (in the noiseless case this holds for any 
d > 0) and at least one pattern is exactly retrieved. How- 
ever, the present model is not devoid of genuine spurious- 
states, which are, in general, mixtures of all patterns. 
These states can be destabilized by decreasing /? (analo- 
gously to the standard Hopfield model) or, interestingly, 
by either increasing or decreasing d. 

In summary, the structural equivalence of associa- 
tive networks and RBMs allows significant developments, 
both practical and theoretical. For instance, one can sim- 
ulate the dynamics of these networks by dealing with an 
update of A^-l-P spins and a storage of only NP synapses, 
instead of updating N spins and storing ~ synapses. 
Moreover, the equivalence suggests that traditional asso- 
ciative networks, where the whole set of neurons needs 
to be properly arranged in order to achieve retrieval, are 
not optimal. We overcome this constraint by diluting the 
links of the REM, which translates into partially blank 
patterns. Interestingly, the resulting associative network 
is not only still able to perform retrieval, but it can ac- 
tually retrieve several patterns contemporary, without 
falling into spurious states. This is an important step to- 
ward real autonomous parallel processing and may find 
applications not only in artificial intelligence PJ!, but 
also in biological contexts Q. For instance, when ap- 
plied to the modeling of the adaptive immune system, 
this result allows to see that the (lymphocyte) network 
is able to successfully respond to several pathogens at 
once f]\. 
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